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Optimizing information flow in small genetic networks. 

IV. Spatial coupling 
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Institute of Science and Technology Austria, Am Campus 1, A-3400 Klosterneuburg, Austria 

We typically think of cells as responding to external signals independently by regulating their 
gene expression levels, yet they often locally exchange information and coordinate. Can such spatial 
coupling be of benefit for conveying signals subject to gene regulatory noise? Here we extend our 
information-theoretic framework for gene regulation to spatially extended systems. As an example, 
we consider a lattice of nuclei responding to a concentration field of a transcriptional regulator 
(the “input”) by expressing a single diffusible target gene. When input concentrations are low, 
diffusive coupling markedly improves information transmission; optimal gene activation functions 
also systematically change. A qualitatively new regulatory strategy emerges where individual cells 
respond to the input in a nearly step-like fashion that is subsequently averaged out by strong 
diffusion. While motivated by early patterning events in the Drosophila embryo, our framework is 
generically applicable to spatially coupled stochastic gene expression models. 


I. INTRODUCTION 

Signals encoded by spatial or temporal concentration 
profiles of certain proteins play a crucial role in commu¬ 
nicating information within and between cells. Such sig¬ 
nals are established and read out by a vast variety of gene 
regulatory networks. In this process, however, informa¬ 
tion flow is limited by the randomness associated with 
gene regulatory interactions—essentially chemical reac¬ 
tions between different species of signaling molecules and 
DNA -taking place at very low copy numbers [Tj. While 
we increasingly understand distinct strategies of noise 
control in biological systems EHS], it remains largely un¬ 
clear how nature orchestrates these strategies to maxi¬ 
mize information flow. 

A recent proposal takes the idea of “information flow” 
in biological networks seriously, as formalized by Shan¬ 
non’s information theory, and hypothesizes that at least 
some of the structure of genetic regulatory networks 
could be derived mathematically, by maximizing infor¬ 
mation transmission that such networks can sustain sub¬ 
ject to biophysically realistic constraints [7]. The fea¬ 
sibility of information as the metric of network perfor¬ 
mance was recently established by demonstrating that 
morphogen patterns in early fly development transmit 
information with an accuracy close to the physical limits 
MM- Theoretically, this optimization principle was ap¬ 
plied to study information flow in different paradigmatic 
gene-regulatory scenarios, including single and multiple 
target genes regulated by a single input [T0] - iT2] . feed¬ 
forward cross-regulation [[T2] , autoactivation [I3j, mul¬ 
tistate promotor-switching im, and time-dependent in¬ 
puts [IMS]- These attempts yielded important insights 
into optimal strategies with which single cells can reli¬ 
ably respond to external stimuli. The issue that to-date 
has not received attention, however, is that many cells or 
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nuclei can respond to a spatially distributed signal col¬ 
lectively, by exchanging information among themselves, 
e.g., via diffusion of signaling proteins. Prominent exam¬ 
ples are the specification of emerging body tissues during 
embryo development [20W2H] , the aggregation response in 
colonies of amoebae [26] [27], and collective chemosensing 
in cell colonies and tissues [281 25] , Importantly, spa¬ 
tial aspects not only add another layer of complexity 
to the distributions of input and output signals of gene- 
regulatory networks; they can also markedly alter noise 
levels—and thus information capacity—through spatial 
averaging [30H55] , A truly predictive theoretical frame¬ 
work therefore must take into account the spatial set¬ 
ting, where multiple regulatory networks can read out 
the inputs at various locations and exchange information 
between themselves locally. 

Here we extend the existing theory of optimal gene 
regulatory networks to such a spatial setting. We con¬ 
struct a spatial-stochastic model of gene regulation that 
takes into account diffusive coupling between neighbor¬ 
ing reaction volumes and the relevant sources of noise. 
We derive simple expressions for the stationary means 
and variances of input-activated local protein levels as 
a function of their regulatory parameters and diffusion 
rate. From this we compute information transmission in 
various spatial scenarios and systematically explore how 
transmission can be optimized by changing key system 
parameters. As a motivating example, we study a model 
variant representative of the bicoid-hunchback system in 
early fruit fly embryogenesis, which enables differentiat¬ 
ing cells to acquire position-dependent fates with high 
reliability and reproducibility in spite of stochasticity in 
the underlying biophysical processes [32] \oE\WI \ . Our ap¬ 
proach thus specifically optimizes positional information, 
a notion that has been extensively discussed throughout 
developmental biology [i3HT7] . but only recently rigor¬ 
ously formalized in mathematical terms, using informa¬ 
tion theory [48]; the proposed formalism, however, gener¬ 
alizes to other spatially coupled gene regulatory setups. 
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We find that spatial coupling can markedly improve 
information transmission when the dominant source of 
noise in gene regulation is on the “input side,” e.g., due 
to the random arrival of the regulatory molecules (tran¬ 
scription factors) to their binding sites on the DNA. This 
is in accordance with the known fact that spatial aver¬ 
aging can remove super-Poissonian contributions to the 
noise [HI EE]. Moreover, with diffusion the maximal in¬ 
formation capacity reached upon optimization becomes 
increasingly invariant to the spatial details of the input 
signal. Finally, in a defined noise regime that we detail 
below, we observe the emergence of a novel optimal reg¬ 
ulatory strategy where diffusion generates graded, and 
thus informative, spatial profiles of regulated genes de¬ 
spite their sharp, threshold-like activation by the regula¬ 
tory inputs. 


II. METHODS 


Here we construct a model in which we can analyt¬ 
ically calculate how much information expression levels 
of diffusible proteins encode locally about the position 
itself, if these proteins are expressed in response to a 
concentration field of regulatory input molecules. The 
outline of our approach is as follows. We (i) define a 
generic spatial-stochastic model of gene regulation that 
explicitly accounts for diffusive signaling between adja¬ 
cent volumes which collectively interpret a spatially dis¬ 
tributed input signal (Section IIA |. Starting from a set 


of coupled Langevin equations, in Section IIB we (ii) de¬ 


rive general analytic expressions for the stationary means 
and variances of the regulated protein levels, and provide 
intuitive insight for the role of diffusion prior to specify¬ 
ing any gene regulatory details. We then (iii) impose 
realistic expressions for the gene regulatory function and 
nois e, bas ed on wel l-esta blished biochemical models (Sec¬ 
tion IIC). Section II D illustrates how we (iv) quantify 


the encoding of positional information, using information 
theory and the stationary moments derived in (ii). Fi¬ 
nally, we vary the parameters of the model to find the op¬ 
timal regulatory strategies in the spatially coupled setup, 
which we present in the “Results” Section EH 


A. Spatial-stochastic model of gene regulation 

Figure [l] shows a schematic of the spatially coupled 
gene regulation model that we study throughout this pa¬ 
per. We consider a two-dimensional lattice of N x x N y 
reaction volumes with equal spacing S between the vol¬ 
umes. In this work, we particularly focus on a cylindrical 
geometry of the lattice, with periodic boundary condi¬ 
tions along the circumference (y-direction), and reflective 
boundary conditions in axial (cc-) direction. While this 
geometry deliberately mimics the syncytial state of the 
developing embryo of the fruit fly Drosophila , our model 
could be easily adapted to any arbitrary discrete arrange- 



FIG. 1. Model schematic. The spatial-stochastic model of 
gene regulation consists of a discrete set of identical reaction 
volumes (indicated by i) arranged in a regular lattice with 
lattice spacing 5. While here for clarity we only draw a ID 
lattice, we also study a 2D model with periodic boundary con¬ 
ditions, and our framework supports arbitrary coordination 
numbers. The volumes collectively sense a spatial signal c(x) 
which maps the position x to an input concentration c (red), 
creating a spatial distribution of output proteins G (blue). 
Each volume contains an identical promotor activated with 
probability f(d), where /(c) is the regulatory function and 
d the activator concentration in volume i. This results in a 
local production rate n = r max /(ci) and local levels G; of the 
product protein. G proteins are degraded with a rate 1 /r and 
can hop to the neighboring volumes with a rate h = D/S 2 , 
where D is their diffusion coefficient. The regulatory function 
/(c) is characterized by the activation threshold I\ and the 
regulatory cooperativity H, as detailed in Section [II Cj 


ment of equally spaced reaction volumes. We will initially 
consider the special case N y = 1 as the “ID model,” while 
referring to the case N y > 1 as the “2D model.” 

Each volume contains the promotor of an identical gene 
controlled by a single transcription factor, whose concen¬ 
tration we denote by c. The copy number of proteins 
expressed from this gene of interest will be denoted by 
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G, while g = G/N max will refer to the copy number nor¬ 
malized by the maximal average expression level N max 
at full induction. At the core of our model is the regu¬ 
latory function /(c) which determines how the activator 
concentration c maps to the effective synthesis rate of 
G proteins. We will also refer to c as the “input,” and 
to G and g as the “output.” For simplicity we treat 
transcription and translation as a single production step 
with maximal production rate r max reached at full activa¬ 
tion. G molecules are degraded with a constant rate 1/r, 
where r is the average protein lifetime. In addition, each 
volume can exchange the product molecules G with its 
neighboring volumes via diffusion. Here, we approximate 
this exchange with an effective “hopping rate” h = D/S 2 , 
where D is the diffusion coefficient of G molecules. 

Denoting the position of volume i by x. t , we can write 
down a stochastic equation for the combined production- 
degradation-diffusion process in volume i as follows: 

d t Gi = r max /(c(xj)) - -Gi 

T 

E (Gi-G„) + £ T ik r, k (1) 

n£N(i) k 

Here Gi denotes the number of G proteins in the vol¬ 
ume, while the first and second term in the equation 
describe their production and degradation, respectively. 
The third term is a discretized Laplacian accounting for 
the diffusive exchange of G molecules between volume 
i and its nearest-neighbor volumes n £ N{i). Function 
c(x) maps positions x, to local transcription factor con¬ 
centrations Ci. In this work we consider the case in which 
c(x) is invariant in y-direction, i.e. q = c(x*) = 
we specify the functional form of c(x) at the end of this 
section. The noise term Y2k^ikVk comprises all noise 
sources ijk that act on the protein level Gi, including 
those outside of volume i. We assume that all fluctu¬ 
ations are well represented by non-multiplicative, zero- 
mean Gaussian white noise, and compute the relevant 
noise powers T^. below. 


B. Means and variances with spatial coupling 

The set of coupled Langevin equations [Eq. 0 ] is a 
special case of what is known as the “N-unit generalized 
Langevin model” gHISD] in other contexts. To compute 
the steady-state means and variances we employ a tech¬ 
nique based on Ito calculus which allows to convert the 
set of coupled stochastic equations for the variables Gj 
into a set of ODEs for their moments that hold exactly. 
This can be performed for an arbitrary discrete coupling 
network, and even for the case of multiplicative noise (not 
considered here) phTHbTI . 

We can write down the equations of motion for the 
mean of the normalized local output gi = Gi/N mgx and 
the (co)variances of its fluctuations Sgi = gi — (gi) as 


follows jifMxT] : 

dt(gi) = ( F(gi,d )) - h Y ( 9i - 9n z ) (2) 

rii€N(i) 

dt(SgiSgj) = (8g i F(g j ,c j )) + (Sg j F(g i ,c i )) 

+ hUg l Y ( 9nj~gj)) 

\ rij£N(j) / 

+ h ( s 9j Y ( 9n ‘ - ) 

\ rii£N(i) / 

+ y^(FifcTjfc) (3) 

k 

Here the function F(gi,Ci) = - (/(q) — <?*) groups the 
production and degradation terms and depends on the 
regulatory function /(c). The n t - and n^-sums run over 
the nearest-neighbor volumes of i and j, respectively. 
Note that in the fc-sum running over all noise sources 
in the system usually most terms will be zero. 

The hitherto arbitrary noise powers can be written 
out more explicitly by listing all noise sources that affect 
volume i in the following way: 

ik r] k =<7t(ci,g)7i 

k 

— ^ ^ >n)£(i—>n) ^ ^ ^(n— >i) 

n£iV(z) n£N(i) 

(4) 

Herein 7 , is the noise in the combined process of ran¬ 
dom regulation, production and degradation, and Gi^j) 
the noise caused by random hopping from volume i to 
j. A crucial step in specifying the expression above is to 
write the noise powers of the incoming and outgoing hop¬ 
ping processes with different signs, because each hopping 
event causing a negative fluctuation in the volume of ori¬ 
gin always causes a positive fluctuation in the volume of 
arrival. The hopping thus is a birth process when seen 
from the volume of arrival, and a death process when seen 
from the volume of origin. This intuitive argument can 
be derived more rigorously from the Langevin equation 
following the method portrayed by Gillespie [52!, an d ' s 
quantitatively confirmed by stochastic simulations, em¬ 
ploying (e.g.) the next-subvolume method 1 53 51] . 

Using Eq. Q we can rewrite the noise term 
E fe (F lfc T jk ) in Eq. § for the variances (case i = j) and 
nearest-neighbor covariances (case i £ N(j)) as follows: 

Y(^r jk ) = (gf(c, g ))+ Y m U + J ^)) 

k n£iV(£) 

= Mu for i = j (5) 

(rifeTj-fc) = (r iK r jK ) + (r iK ,r jK/ ) 

hopping hopping (j-u) 

= -AW - <Am)> 

= Ay for i € N{j) (6) 
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To obtain the stationary means and covariances of the 
protein number g.-,. we solved Eqs. <[2j) and p]) in steady 
state by setting their left sides to zero (<9 t (.T) = 0). Af¬ 
ter substituting gi = ( gi) + 5gi , we regroup covariances 
( SgtSgj) and means {gi) and relate them to the respective 
quantities in the neighboring volumes (see Appendix [A|. 
In this context, we introduce the pragmatic assump¬ 
tion that only concentrations in volumes that are nearest 
neighbors on the lattice have significant correlations, i.e., 
we enforce 


{SgzSgj) = 0 


for j N(i) 


(7) 


While this “short correlations assumption” markedly fa¬ 
cilitates both analytical progress and computation speed, 
and yields formulas that are intuitive to interpret, our 
results only change marginally when the assumption is 
released (see Section III DI. For brevity, in the following 
we write 


9i = (9i), = (6g 2 ), C t j = (Sgrfgj ) (8) 

for the steady-state means, variances and covariances, 
respectively. Treating the cases i = j and i £ Af(j) 
in Eq. ([3]) separately, we arrive at the following set of 
coupled equations for these quantities (cf. Appendix [A| : 


gi ~l + 2dA r, + '‘ £ 


\ "iew(i) / 


= Trt+ A 2 ^2 9^ 

(9) 

ni£N(i) 


a i = © Cim + 7^ A/) 2 

(10) 

rii£N(i) 


= -J- ( a i + a j) + ~2^ij 

(11) 


Here we abbreviate normalized production rate 77 = 
/(ci)/r; d is the lattice dimension, or half of the coor¬ 
dination number of a reaction volume. We introduce the 
dimensionless diffusion constant A = D/D 0 , which is 
equal to the diffusion constant measured in the “natu¬ 
ral” unit Dq = 5 2 /t; note that A = hr. We will also 
refer to A as the “spatial coupling.” 

The above equations define two “effective parameters”: 
(i), an “effective residence time” T = r/( 1 + 2dA), re¬ 
flecting the fact that with diffusion proteins are taken 
out from the reaction volume faster than through reg¬ 
ular degradation in the absence of diffusion; and (ii), 
the (dimensionless) “mixing parameter” A = \JhT = 
\/DT/5 = -\/A7(l+~2dA), which equals the distance 
travelled via diffusion during the time T, measured in 
units of the lattice spacing 5. These quantities cap¬ 
ture the essential effect of diffusive coupling between 
the stochastic processes in neighboring reaction volumes: 
With increasing coupling (A > 0 => A 2 >0), the mean 
output level g z is increasingly set by the mean levels in 
the neighboring volumes g ni , whereas the contribution of 


the local production to the apparent copy number in the 
volume is reduced (T < r). An analogous interpretation 
holds for a 2 if is interpreted as a “noise production” 
term. 

We verified that Eqs. (Toj ) and ( fTTj ) correctly re¬ 
produce the known result that all super-Poissonian noise 
is attenuated in the “Poissonian limit,” meaning that (in 
non-normalized units) the variance becomes equal to the 
mean for inifinitely strong coupling A —» oo [HIGH]- This 
holds both in ID and 2D, and with or without short- 
correlations assumption. 

Equations ©, flip] ) and ( |TT| ) can be solved for the vari¬ 
ables gi , cr 2 and 6\.y after specifying the detailed forms 
of the regulatory function and noise strength and impos¬ 
ing meaningful boundary conditions. If the regulatory 
function and noise do not contain terms nonlinear in the 
variables of interest, Eqs. (|9|, © and © form a set 
of coupled linear equations that can be solved by simple 
matrix inversion. Also notice that we may use Eq. © 
to simplify Eq. (101 further if we are only interested in 
solving for the variances of (as in this work); we show 
the respective formulas for the ID and 2D model in Ap¬ 
pendix [B] 


C. Regulatory function and gene expression noise 

To describe the probability of the promotor being in its 
active state given that the concentration of the transcrip¬ 
tion factor is c, we choose a simple Hill-type regulation 
model: 


/(c) = 




c H +K H ' 


( 12 ) 


where H is the Hill coefficient, quantifying the cooper- 
ativity of the activation process, and K the activation 
threshold, i.e. the concentration at which half-activation 
occurs, f (K) = 1/2. 

To complete our analytical model we need to accu¬ 
rately specify the noise powers (Gf) and appear¬ 

ing in the noise terms A/ 2 and A/ 2 . In steady state, all 
of these noise powers only depend on the mean levels of 
the transcription factor ({cj}) and product ({gi}). 

Let us first specify the power of the regula¬ 
tion/production/degradation noise, (Gf), which has two 
contributions: super-Poissonian “input noise” (Gtfi) 
originating from random arrival of regulatory transcrip¬ 
tion factors, and Poissonian “output noise” ( G 2 PD t ) from 
the production and degradation of the gene product. The 
input noise is a function of the transcription factor con¬ 
centration and is well-approximated by [TT1E2 ISSj 



C=Ci 


where D c is the diffusion coefficient of the transcription 
factors, l c the typical size of a transcription factor binding 
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site and C; the local activator concentration. The above 
formula describes fluctuations of Gi in absolute units; 
to obtain the noise power ( Gtf,% ) of fluctuations in the 
normalized output gi, (G^f,} has to be normalized as 
well by A^ ax = ( 

T max^") • 




1 


1 

'fdf(c)\ 2 2cA max " 

^ma xT 

V dc J D c I c t 



2 

2ccq 


C=Ci 


(14) 


In the last step, Co = N max /(D c I c t) defines a typical 
concentration scale for our problem; in the following we 
will measure concentrations in units of cq. 

The output noise in absolute units can be written as 
the sum of production and degradation rate, as for a 
regular birth/death process: 


(G. 


PD,it 


= r n 


J(ci) + -(Gi) 


(15) 


Note that in general (Gi) ^ r max f(a)T due to the dif¬ 
fusive coupling. Also note that bursty production (not 
considered in this work) could be easily incorporated into 
the model via a prefactor cf> (Fano factor) in the produc¬ 
tion term: r max /(ci) — > 4>r max f (ci) . In normalized units 
the production/degradation noise reads: 


D. Quantifying positional information 


Building on our previous work EUMB], we quantify 
the amount of information that the noisy output signal 
g (x) carries about the position x using the mutual infor¬ 
mation I(x',g) between the input and output |571 55] , a 
central quantity of Shannon’s information theory E3: 




dx P x ( x ) 


dgP(g\x) log 2 


' P{g\x) 

. p g(g) . 


( 20 ) 


In the information-theoretic picture introduced here, the 
information channel is defined by a set of sampling units 
(cells or nuclei at a particular position) that implement 
the same gene regulatory network. These sampling units 
read out the input signal c with a distribution that is 
jointly determined by the shape of the concentration 
field, c(x ), and the spatial distribution of the sampling 
units, P x (x). In response to these inputs, the regulatory 
networks locally express the output gene g at the corre¬ 
sponding level, producing an output distribution P g (g) 
across the ensemble of sampling units. 

The mutual information can be rewritten as a dif¬ 
ference between the entropy of the output distribution 
S[P 9 (g )] = -/ dgPg(g)\og 2 P g (g) and the average en¬ 
tropy of the conditional distribution of g at fixed position 
x: 


( Gpo.i) — 


( Gpo,i ) 


N„ 


(/(ci) + gi) 


(16) 


In our model, diffusive hopping of a molecule is iden¬ 
tical with its degradation in the volume of origin and 
simultaneous production of a molecule in the volume of 
arrival. The corresponding noise powers thus are 

given by the rate of particle loss through diffusive hop¬ 
ping from the volume of origin, which in steady state is 
given by the mean copy number in that volume times the 
hopping rate, and can be normalized in the same way as 
the other noise powers: 


I(x\g) = S[P g (g)}~ J dxP x (x)S[P(g\x)\ ( 21 ) 

When only the local distribution of outputs P(g\x ) is 
known, P g (g ) can be straightforwardly constructed from 
P(g\x) and P x (x): 


Pg(g) = J dxP x (x)P(g\x) ( 22 ) 

In our discrete model we assume uniform spacing of the 
sampling positions Xi G [0, N X S], i.e. P x (xi ) = 1/N X . 
Equations © and ([22]) then respectively become: 


u iV max 1 v max 1 

(17) 

Adding up all contributions, we obtain for the (nor¬ 
malized) combined noise powers J\f^ and J\T?: 

Afij = = - - (Si + 9j) ( 18 ) 

•Nfi = ( Gpo,i ) + ( GrF,i ) + (Pfi—in) + P(n-H)) 

n£N(i) 


I(x;g ) = - J dgP g (g)\og 2 P g (g) 


PM 


+ 


N x 


N . r 

/ dgP(g\xi)\og 2 P(g\xi) 
2=1 J 


1 

Ax 


N x 

p (g\ x i ) 

2=1 


(23) 

(24) 


I(x;g) is therefore fully characterized by the conditional 
output distribution P(g\xi). Since here we only consider 
non-multiplicative white noise, P(g\xi) is Gaussian: 


Amax' 7 " 


f{d) + gi 


( dfjc) 
l dc 


2ccq 


+ A ^2 ( 9i + gn) ] (19) 

nEiV ( 2 ) 


P(g\xi) 



( {g-g(xj)) 2 \ 

\ 2o-|(ar*) J 


(25) 


The information I(x;g) is therefore fully determined by 
the (local) means and variances of the conditional distri¬ 
butions P(g\xi). 
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In our previous work HOMES], we have computed the 
maximum achievable information transmission (channel 
capacity) by optimizing over the distribution of input sig¬ 
nals, relying on the “small noise approximation” to make 
the problem analytically tractable. Here, in contrast, the 
spatial setting has led us to choose a uniform distribution 
of sampling points, P x (x). While this relieves us of the 
need to use the small noise approximation, to fully spec¬ 
ify the problem we still need to select the function c(x ) 
that maps the input of the regulatory network to spa¬ 
tial positions. Motivated by the developmental example 
of the fruit fly, here we choose an exponential function 

puma.: 

cOr)=c max e-*/^, (26) 


where c max controls the maximum achievable input, 70 = 
L /5 = N X S /5 is the decay length of a typical exponen¬ 
tial morphogen gradient relative to the system size L 
[36] (37] [59], and A a variable dimensionless scaling fac¬ 
tor (the precise value of y 0 is insignificant; we could opt 
for any convenient length unit). This choice generates 
a family of input profiles parametrized by c max and A, 
and we will explore various settings for these parameters 
to maximize the positional information encoded by the 
expression level g. 

We can now assemble the different components of the 
model. By inserting the regulation and noise terms 
[Eqs. (12), (18) and (19)] into the steady-state solutions 
of the coupled stochastic equations [Eqs. §, ©, ©] 
we obtain two coupled linear equation systems for mean 
expression levels g, and (co)variances Cij and of which 
we can solve given the selected boundary conditions. Us¬ 
ing Eq. (231 we can then compute the mutual information 
/( x; g). When computing the stochastic moments, we can 
vary the parameters of the input function, the regulation 
and the spatial coupling, and thus compute I{x;g) as 
a function of its key determinants. The baseline set of 
parameters that we used for numerical computation is 
presented in Appendix |D| 


III. RESULTS 

In order to elucidate how spatial coupling alters the 
capacity of encoding positional information in the down¬ 
stream gene g , we optimized the mutual information 
I(x;g) over the parameters of our model: the dissoci¬ 
ation constant, A', and the Hill coefficient of regulation, 
H; the (normalized) diffusion constant of the gene prod¬ 
uct, A; and the length scale of the input gradient, A. As is 
known from previous work, the key parameter that qual¬ 
itatively determines the shape of the optimal solutions is 
the ratio of the output to input noise, which is set by a di¬ 
mensionless maximal input concentration, C = c max /co 
EH; in general, C 1 is the regime of dominant output 
noise, while C <C 1 is the regime of dominant input noise, 
as defined in Section |TlC| We thus explored the optimal 
solutions as a function of C in what follows. 


For computational efficiency we initially studied the 
ID model with short-correlations assumption (SCA), and 
later compared to the 2D models with and without SCA, 
finding only minor differences (see Section III D). 


A. Spatial coupling can enhance information 
transmission 

We first assessed how introducing diffusive coupling 
changes the information capacity of the system in the 
space of regulatory parameters K and H. To establish 
the baseline for comparison, we started with the case 
without spatial coupling, A ~ 0, fixing C = 1 and input 
gradient length A = 1. Figure[2]\ shows the “information 
plane” for this case, i.e., the mutual information I(x\g) 
as a function of the Hill coefficient H and the activa¬ 
tion threshold AT. Consistently with our previous studies 
mm, the information plane displays a clear optimal 
choice of H and K that maximizes information transmis¬ 
sion. The optimum results from a nontrivial compromise 
between evenly spreading the whole dynamic range of the 
output signal along the z-axis, which ideally requires low 
Hill coefficients and half-activation at the system center 
(x = A/2, K = c(A/2)), and a system-wide minimization 
of the noise; the latter generally favors activation at high 
input concentrations (to avoid large fluctuations at low 
c) and thus higher AT. 

Figures [2)3 and [2p show the same information plane, 
but now at increasing values for the spatial coupling A. 
The first observation is that with nonzero A information 
can be increased relative to the baseline at the optimal 
choice of H and AT, but that the information drops again 
when the diffusive coupling is too strong, indicating that 
there exists a nontrivial optimal value for A that maxi¬ 
mizes information. The second observation relates to the 
concomitant change in the optimal parameters {AT*, H*} 
as the diffusive coupling is increased. In particular, in¬ 
creasing the diffusion constant shifts the optimum to¬ 
wards higher H and lower K. The third observation 
is that strong diffusion also increases the capacity away 
from the optimum in the regime of high H, creating a 
flat plateau where the precise value of H is not crucial 
for attaining high information transmission. 

All these observations can be understood intuitively: 
The increase in capacity for nonzero diffusion is due to 
the trade-off between the ability of diffusive spatial aver¬ 
aging to reduce noise (thus increasing the transmission), 
and smoothing out the response profile (thus decreasing 
the transmission) [3D] [3Tj. The shift in optimal regu¬ 
latory parameters is a direct consequence of these two 
effects: the detrimental effect of smoothing out the re¬ 
sponse profile can be partially compensated for by choos¬ 
ing higher optimal Hill coefficients, while noise reduction 
allows the optimal AT to move towards lower concentra¬ 
tions to better utilize the dynamic range. The plateau 
in information at high diffusion results from the abil¬ 
ity of the diffusion to flatten sharp, nearly step-like gene 
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FIG. 2. Information planes for different values of the spatial coupling, A. Shown is the mutual information I(x',g) 
as a function of the Hill coefficient H and the activation threshold K, (A) with negligible spatial coupling, A ~ 0; (B) with 
intermediate spatial coupling, A = 10; and (C) with very strong spatial coupling, A ~ 316. The data shown is for the ID 
model with short-correlations assumption. 


activation profiles at high H (limited to at most 1 bit 
of positional information for H —> oo and A —> 0) into 
smooth spatial gradients that can convey more positional 
information. 


B. Spatial coupling is most beneficial when input 
noise is dominant 

After understanding the optimal behavior of informa¬ 
tion as a function of regulatory parameters H , K at fixed 
C = 1, A = 1, and A, we systematically varied the dif¬ 
fusive coupling A and the input gradient length scale 
A for several different values of C. For each choice of 
(A, A) parameters, we separately mapped out the infor¬ 
mation as a function of H and K to find the optimal 
values ( H*,K*) and the corresponding maximal infor¬ 
mation transmission. 

Figure [3] summarizes this exploration and shows the 
information capacity as a function of A and A for a series 
of 6 different C values. White stars mark the maximal 
capacity for the given C, i.e., the capacity for jointly 
optimal choice of H 1 K , A and A. 

The panel for C = 1 demonstrates two beneficial ef¬ 
fects of diffusion on information capacity: increasing A 
from very low values to A < 100 increases I(x\g ) and 
simultaneously enlarges the range of A over which high 
capacity can be reached. This effect is even more pro¬ 
nounced when C < 1: at very low C the beneficial val¬ 
ues for the diffusion constant A are strongly constrained 
to a narrow interval, and the resulting capacity gain at 
optimal A with respect to low A increases markedly. In 
contrast, when C is high, diffusive coupling does not con¬ 
vey a large benefit in information transmission. This is 
due to the fact that diffusion can only remove super- 
Poissonian parts of the noise m EH- In our model, 
super-Poissoninan noise is the input noise, which plays 


a minor role for C 1, thereby limiting the potential 
role of diffusion. We note that super-Poissonian fluctu¬ 
ations could also occur on the output side (e.g., when 
gene expression is bursty and multiple protein copies are 
produced from each mRNA), in which case diffusive cou¬ 
pling could be beneficial even for C 1. As expected, 
for all C, the information capacity drops to zero at very 
high A independently of all other parameters, because in 
that limit all output profiles become flat and thus unin¬ 
formative. 


We systematically explored how diffusion affects the 
optimal information transmission as a function of C in 
Fig-0 by comparing the optimally coupled spatial model 
to a model where diffusion is set to zero. Because the in¬ 
formation transmission is largely invariant to A and for 
most values of C peaks in a very broad plateau at A = 1, 
we fixed A = 1 for this comparison, while optimizing over 
all other parameters (separately for the spatially coupled 
system and the system with A ~ 0). Clearly, in the low- 
C regime optimal diffusive coupling can enhance informa¬ 
tion capacity by more than a bit, while for C > 100 the 
noise composition is strongly dominated by Poissonian 
output noise such that diffusion cannot further improve 
information throughput. 


Instead of varying C by changing the maximal input 
concentration c max , we also varied C via A max , thus al¬ 
tering output noise at constant levels of input noise. In 
that case we observe the same behavior, i.e. significant 
capacity enhancement by diffusion in the low-C regime, 
with the small difference that now the capacity increases 
with decreasing C ~ 1/A^ max , as presented in more detail 
in Appendix |E| 
















C = 100 


C = 10 


C = 1 




C = 0.1 



10 3 10 2 10 1 10° 10 1 10 2 10 3 10 4 


A 


C = 0.01 


.."" mVi 1 

' TT WI 

'i'll 1 , ' 
'I’ll' ' 

I 1 !" 1 | 

Ml 

'i'll 1 1 
'i'll' ] 

3 

vx ' 

l/l | 

1 

tel 

/ 

"ill , 

i, 1 ' 1 ' io' 

"l\/ 

'' c 




10 3 io 2 10 1 10° 10 1 10 2 10 3 10 4 


A 


C = 0.001 


1... 

'll '1 ' 


'll'I, 

i ii n ^ 

i 

III 1 ft 

i"! i 

i" 1 " I 

in" i 

i' i ii" iiiiij 

Sibil 

1' in" ,ii 

i 1 "! 1 as 


Hi"™ 

i 

(v « , 

l", 1 " ill 1 

%" , 

il 


b.i". w 

. 

'll' 1 - 


io- 3 io- 2 10 1 10° io' io 2 io 3 io 4 


A 


[bits] 



FIG. 3. Information transmission as a function of the spatial coupling A and the input gradient length A for 
various values of maximal input concentration, C. For comparison, the same scale is used for all contour maps. White 
stars mark the overall maximum of I{x\g) in each respective panel; white boxes show the corresponding optimal amount of 
transmitted information (in bits). Each datapoint was obtained by optimizing I(x;g) over the regulatory parameters H and 
K for the given set of C, A and A. The data shown is for the ID model with short-correlations assumption. 


C. Spatial coupling enables a novel regulatory 
strategy at high input noise 

Spatial coupling affects both the noise level of the out¬ 
put gene as well as the shape of its output profile. We 
wondered how these two effects combine to affect the 
information transmission. Figure [5] shows the optimal 
mean output profile g(x), the underlying activation pro¬ 
file f(x), and two measures of the noise in gene expres¬ 
sion, the standard deviation cr 2 {x) and the Fano fac¬ 
tor $(x) = N max a 2 (x)/g(x ), as a function of position 
x = x/L € [0,1]. We plot these quantities for three 
values of C, representative of the input-noise-donrinated 
regime (C = 0.01), the regime in which input and out¬ 
put noise are approximately balanced (C = 1), and the 
output-noise-dominated regime (C = 100). In each plot 
we directly compare the system in which the diffusion 
constant is optimized along with all other parameters, 
vs. the system that is optimal at zero spatial coupling. 


For C = 100, input noise levels are so low that the 
total noise is dominated by purely Poissonian output 
noise throughout the whole system ($(;r) ~ 1). There¬ 
fore, strong spatial coupling would fail to decrease the 
noise further and would merely fade out the mean out¬ 
put profile. Consequently, the optimal diffusion constant 
A* = 1 is low, and the spatially coupled system only 
differs marginally from the uncoupled system. 

For C = 1 in the uncoupled system, input noise be¬ 
comes more prominent as a fraction of the total noise, 
such that $(£) > 1 for x > 0.3 (cyan colored line 
in Fig. W) ; in line with this, the activation thresh¬ 
old is shifted towards higher input concentrations (i.e. 
smaller x) at the expense of reducing the dynamic range. 
In the coupled system the optimal diffusion constant 
is markedly higher (A* = 25) than at C = 100, and 
the resulting spatial averaging can almost completely re¬ 
move the additional non-Poissonian noise contributions 
(<b(i;) ~ 1). This allows the optimal activation threshold 
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FIG. 4. Information capacity for varying maximal in¬ 
put concentration, C. Shown is the optimized information 
capacity as a function of C = c max /co with optimal diffusive 
coupling (7*(C), solid blue line) and without diffusive cou¬ 
pling ( Io(C ), dashed blue line). At small C, non-Poissonian 
input noise is dominant, while Poissonian output noise domi¬ 
nates for large C. The blue-shaded area depicts the maximal 
gain in information capacity from diffusive coupling: spatial 
coupling enhances information capacity most efficiently when 
input noise dominates. The dashed vertical lines mark the C 
values corresponding to the three cases displayed in Fig. [5] 


C = 100, A* = 1 C = 1, A* = 25 C = 0.01, A* = 75 
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FIG. 5. Comparison of optimal output profiles for dif¬ 
ferent values of the maximal input concentration, C. 
(A-C) Dominant output noise, C = 100, blue lines. (D- 
F) Balanced noise, C = 1, red lines. (G-I) Dominant in¬ 
put noise, C = 0.01, green lines. For every value of C, the 
normalized mean output profile g(x ) (dashed lines) and the 
optimal activation profile f(x) (faint solid lines) are shown 
in the top row; the noise in gene expression is shown as the 
standard deviation cr(x) in the middle row; and the Fano fac¬ 
tor <f?(:r) = N m _ a ,^a 2 (x)/g(x), is shown in the bottom row. All 
profiles are shown as a function of the normalized position, 
x = x/L. The corresponding quantities for the optimal spa¬ 
tially uncoupled (A ~ 0) system are shown in complementary 
colors (see plot legends). 


K to shift towards lower input concentrations in order to 
attain a similarly large dynamic range as for C = 100. 

At C = 0.01, when input noise is exceedingly domi¬ 
nant, we observe the emergence of a qualitatively new 
regulatory strategy in the spatially coupled system. For 
both spatially coupled and uncoupled systems, low C fa¬ 
vors sharp activation profiles with high Hill coefficients 

H. For the spatially uncoupled system, the optimal acti¬ 
vation threshold K* moves towards central positions. In 
contrast, in the optimal spatially coupled system, the op¬ 
timal activation threshold K* moves closer towards x = 0 
where input noise is smaller, and the optimal Hill coeffi¬ 
cient makes the activation nearly step-like. Without spa¬ 
tial coupling, this strategy could not be optimal, because 
it would be limited to at most one bit—in the region 
where the activation profile is flat, positions could not be 
discriminated based on the gene expression level g. Op¬ 
timal diffusion can, however, smooth out this step-like 
activation to generate a graded gene expression profile 
and restore position discriminability at essentially all po¬ 
sitions. Simultaneously, optimal diffusion transforms the 
noise in an interesting way, as can be seen in Fig.[5jd and 

I. The step-like activation curve suppresses the super- 
Poissonian input noise everywhere except for the narrow 
interval of x around the transition point where the super- 
Poisson component has a very sharp and tall peak; but 
this can very effectively be removed by the strong dif¬ 


fusive spatial averaging. In sum, at low C, the optimal 
strategy is to use very sharp activation profiles, since they 
act as a “filter” for input noise (which must go to zero 
at zero or saturated gene expression); the strong optimal 
diffusion then smoothes out the step-like activation into 
a graded response and removes much of the remaining 
input noise around the very localized transition region at 
c« AW 

Usually, the effect of diffusion on information trans¬ 
mission is understood to be a tradeoff between the ben¬ 
eficial effect of noise averaging and detrimental effect of 
smoothing the profile, but here it seems that smoothing 
the step-like activation profile is also beneficial for the 
information flow. To check that this is indeed a qualita¬ 
tively different optimal regulatory strategy, we performed 
a systematic study of the optimal regulatory parameters 
between C = 0.1 and C = 0.01. We found that here 
two regulatory strategies compete, as evidenced by two 
local maxima in the ( K , H) information planes for this 
range of C values: one favors relatively low Hill coeffi¬ 
cients, H* ~ 1, and high activation thresholds A'*; the 
other favors high H* and lower K*. For C = 0.01 and 
10 ~ A < 30 both regulatory strategies lead to almost 
equal information capacities, but for A > 30 steep activa¬ 
tion combined with fast diffusion is preferable. Figure [S2| 
in Appendix [F] illustrates this phenomenon. 
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D. Full solution in 2D geometry differs only 
marginally from an approximate solution in ID 


All results presented in the previous sections were ob¬ 
tained using the ID model with short-correlations as¬ 
sumption (SCA). To assess how representative they are 
of the fully detailed model, we compared the ID model 
with SCA to the 2D model with SCA, and finally to the 
2D model that retains more than only nearest-neighbor 


correlations (see Appendix 
two cases and Appendix [C 
Figure [S3] in Appendix 


Blfor the formulas in the first 
for the latter case). 

G] compares the information 


planes /(A, A) for the three cases at C = 1. The re¬ 
sults suggest that the influence of lateral spatial coupling 
present in the 2D model is minor. Particularly in the 
range of spatial couplings A around the information max¬ 
imum, where the system is operating close to the hard 
bound set by irreducible Poissonian noise, the difference 
between information capacities in ID and 2D is almost 
negligible. Inclusion of 2D couplings mainly increases 
I(x;g) values away from the optimal range of parame¬ 
ters, thus effectively enlarging the region of parameter 
space in which the system can attain close-to-optimal in¬ 
formation transmission. This again is most prominent for 
the low C regime, as expected, but still limited to a small 
fraction (< 10%) of the total capacity (data not shown). 
The effect of incorporating longer-ranged correlations is 
minor as well: the information planes for the 2D model 
with and without SCA are almost indistinguishible. 

Overall, this demonstrates that the ID model with 
short-correlations assumption is a good approximation 
of the fully detailed model. In the given context, the 
strength of spatial coupling appears more relevant than 
its topology. 


IV. DISCUSSION 


We developed a generic framework to compute infor¬ 
mation flow through spatially coupled gene regulatory 
networks at steady state. As the simplest example, we 
considered how a spatially varying concentration field 
(“input profile”) is read out collectively by a regular ID 
or 2D lattice of sampling units that are interacting by 
local diffusion of the gene product. A directly applicable 
example could be cells (or nuclei) in a developing mul¬ 
ticellular organism that respond to the morphogen field 
by expressing developmental genes whose products can 
be exchanged between neighboring cells. We emphasize 
that our framework is generic in that effects due to spa¬ 
tial coupling in any chosen geometry can be worked out 
before assuming any particular gene regulatory function 
and writing down the relevant noise power spectra, as 
is evident from Eqs. ([9|, (10) and (11). This makes the 
framework widely applicable to a broad range of biolog¬ 
ical problems in which spatially distributed information 
is collectively sensed by a set of discrete sampling units 
and encoded into a spatial output. 


The theory presented here is a direct continuation of 
our previous work on quantifying information flow in 
gene regulatory networks at increasing levels of realism 
pan. In our previous approaches, however, we were 
striving for analytical approximations that would per¬ 
mit computing the channel capacity, that is, performing 
analytical optimization over all possible distributions of 
input signals, P c (c); this led us to adopt the “small noise 
approximation.” Here, in contrast, we assumed a partic¬ 
ular geometry (with a uniform distribution over sampling 
units, P x (x )) and a particular functional form of the in¬ 
put concentration field, c{x)\ the latter can depend on 
parameters which one can optimize, but we do not per¬ 
form functional optimization over all possible c{x ). While 
these restrictions may appear stronger than in our previ¬ 
ous work, they also allow us to move to a truly spatially 
discrete setup (that has a natural minimal spatial scale of 
a cell or nucleus, as is the case in reality), and permit to 
relax the small noise approximation, which is no longer 
assumed in this work. We also note that a parametric 
choice of c{x) is not as restrictive as it might appear ini¬ 
tially. First, one could choose a rich enough parametric 
form (basis set) for c{x) that ensures spatial smoothness 
but otherwise allows optimization in a full space of plau¬ 
sible profiles. Second, it is intuitively clear that mono¬ 
tonic functions encode positional information better than 
non-monotonic, or even constant functions, strongly nar¬ 
rowing the range of functional forms over which opti¬ 
mization should take place. Third, while we focused on 
exponential gradients which can be easily parametrized 
by only two parameters, exponential profiles actually are 
widespread throughout biology [331IM102 E2 EO] • 

Several previous approaches assessed how diffusive 
coupling alters the precision of spatial protein pat¬ 
terns driven by spatially distributed inputs [301 EH EE]. 
While similar in spirit to ours, these works employed 
measures of “regulatory precision” such as the output 
pattern steepness and sharpness, which- unlike mutual 
information—do not capture the problem in its full rich¬ 
ness: these measures quantify precision only locally, on 
parts of the output signal, and moreover, make implicit 
assumptions about which feature of the output (bound¬ 
ary steepness, boundary position, etc.) is informative. 
Information theory removes this arbitrariness [48] . and 
permits extensions beyond the simple cases studied here. 
Similar approaches have recently been suggested for spe¬ 
cific biological systems [M[ [351 El] ■ 

In our example application we studied how spatial 
coupling alters optimal information flow in a single- 
input/single-output gene regulatory network motivated 
by the bicoid-hunchback system, which is a part of the 
gap gene network in early fruit fly development i20l ]2T] • 
The main outcome of this analysis is that diffusive cou¬ 
pling enhances information capacity by removing super- 
Poissonian components of the noise, in line with previous 
work [30l l3l| ; this effect is large when input noise is non- 
negligible (C < 1). When diffusion plays a large role in 
an optimal system, the activation functions can deviate 
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substantially from the resulting output gene expression 
profiles. At the extreme, the activation functions can 
become step-like with very high Hill coefficients, while 
the gene expression profile still smoothly changes over 
its full dynamic range; this strategy, optimal at low C, 
where both spatial noise averaging and profile smooth¬ 
ing due to diffusion act jointly to increase information, 
is qualitatively different from the C regime where pro¬ 
file smoothing is detrimental and trades off against noise 
averaging. 

In our setup, super-Poissoninan contributions to noise 
are fully accounted for by the input noise, but in reality 
this need not be the case. There are super-Poissonian 
contributions to noise that arise at the output, for in¬ 
stance, due to bursty production of proteins when the 
gene is activated. The simplest case is when mRNA ex¬ 
pression is rate limiting, but then each mRNA can lead to 
a burst in the number of translated proteins. The theory 
can be simply extended to these cases by introducing a 
burst size into the relevant noise power spectra. Impor¬ 
tantly, this would imply that diffusion can be effective 
at noise reduction (and thus beneficial for information 
transmission) also in the regime where C > 1. In sup¬ 
port of this, recently another model linking mRNA ex¬ 
pression and protein production with diffusion based on 
stochastic branching theory has shown that deviations 
from Poissonian statistics in typical biological conditions 
are expected only for rather large burst sizes (> 50) [53]. 

Recently, the Held has made progress in detailed un¬ 
derstanding of input-side noise in gene regulation due 
to stochastic diffusive arrivals of regulatory molecules 
to their binding sites [53J 55]. This has led to a revi¬ 
sion of the previously suggested functional form for the 
Berg-Purcell limit [55] [57] that we use in Eq. (13). Fur¬ 
thermore, we have also taken the simplest (Hill-type) 
regulatory function as our model for gene regulation, 
rather than picking a richer and potentially more real¬ 
istic choice, such as the Monod-Wyman-Changeux form. 
These refinements will be included into our model in the 
future, but there is no reason to expect that they could 
change any qualitative outcome of our analyses. 

It is interesting to speculate about the predictive power 
of optimization-based approaches applied to biological 


systems. In neuroscience, the principle of “efficient cod¬ 
ing” which states that neural sensory systems devote 
their limited resources to maximize the information flow, 
in bits, from the naturalistic stimuli into the spiking neu¬ 
ral representation [1551 , has proven enormously successful 
since the end of 1950s when it was first suggested by Bar- 
low [69] . In signaling and gene regulation, similar ideas 
are much younger. On the other hand, the number of dis¬ 
tinct genes or signaling proteins involved in the networks 
of interest is much smaller than the number of neurons; 
furthermore, molecular processes and the physical lim¬ 
its to sources of noise in gene regulation might be more 
easily understood than in neuroscience, where even a sin¬ 
gle neuron is a very complex object. It appears feasible 
that for small genetic networks the optimization problem 
would be tractable upon combining all phenomenology 
that until now we have analyzed separately: multiple, 
interacting genes, driven by one or more inputs, poten¬ 
tially with arbitrary feedback, all major relevant sources 
of noise, spatial coupling, and readout constraints proj- 
il4| . A major goal would also be the ability to relax the 
steady-state assumptions, by considering either readout 
at particular time points (out of steady state), or the in¬ 
formation between full state trajectories [T5UT5] . Such a 
predictive theory, from which optimal networks could be 
derived mathematically, could then be realistically con¬ 
fronted with well-studied networks that can be quantita¬ 
tively measured, e.g., the gap gene network active dur¬ 
ing Drosophila development 20), EH [39]. It is intriguing 
to think that even in the molecular world the need to 
pay for abstract bits of information- in the currency of 
time or energy—led nature to choose particular regula¬ 
tory networks, and perhaps poised them at special oper¬ 
ating points m- 
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Appendix A: Calculation of stationary means and 
(co) variances 

Considering Eq. © in steady state immediately yields: 

( F{g u a )) = f(d) - gt) 

= h Y (9i-9ni) = 2 dh(gi) - h Y (g n .) 

ni€N(i) ni£N(i) 

(Al) 

This simply reflects the steady-state balance between 
production (and degradation) and diffusive in-/outflux 
in volume i. Grouping (gi) terms on one side gives 


(. 9i) = 


1 + 2 dhr 


f{ci) + h Y (SnJ 


ilieN(i) 

** gi=Tf(a) + A 2 Y 9 n i (A2) 

niGiV(i) 

where in the last step we abbreviate g t = (gi), T = 
r/ (1 + 2dhr), A 2 = hT, as in the main text. 

To simplify the steady-state expression for the covari¬ 
ances resulting from Eq. ([3| it is instructive to treat its 
different parts separately. For the terms correlating the 
fluctuations in volume i with the production/degradation 
process in volume j we have (note that (Sgi) = 0 by con¬ 
struction): 


(8g i F(gj,c j )) = (Sg l )f(c j ) - -(Sg t gj) 

= --{Sgi (( 9j) + Sgj)) = --{SgiSgj) 

T T 

(A3) 


Hence, 

2 

( Sg i F(g jl c j )) + ( 5g :j F(g i ,c i )) = --(5g l 6g j ) (A4) 

T 

because (SgiSgj) = (SgjSg z ). 

Similarly, we can rewrite the term that correlates Sgi 
with the diffusive “neighbor fluxes” of gj 

hi S gi Y (9nj-9j)) 

\ njGN(j) / 

= hi Sgi Y {{9n j ) + Sg nj -{g j )-8gj)) 

\ n i^ N U) / 

= h i Y (Sgi)(g nj ) + (SgiSg nj ) 

\rij£N(j) j 

- 2 dh((Sg i )(g j ) - ( SgiSgj )) 

= h J Y (S9iSg nj }\ - 2dh(5giSgj) (A5) 


and analogously for the term in which i and j are ex¬ 
changed. 

Reinserting (A4) and (A5) into the right side of Eq. ©, 
collecting covariance terms ( SgiSgj ) on the left side, and 
multiplying by r /2 we obtain: 

(1 + 2 dhr) (SgiSgj) 


hr 


Y (SgiSgnj) + Y (SgjSgm) 




ynjeNU) 


A 2 


rii€N(i) 

T 


'Y, (rifeTjfc) 


Qj ~ c, I Yh Cinj + Y^ Cjm 




i£N(i) 


+ E 


where C i3 = (SgiSgj). The above equation couples the 
covariance Cij to the covariances and Cj n r, these 
represent the correlations between the protein number 
in volume i and the neighbor volumes lij of j, and the 
analogous quantity with i and j exchanged, respectively. 
Hence, even if i and j are nearest neighbors on the lat¬ 
tice, the expressions summing Ci Uj and Cj ni over rij and 
rij, respectively, will contain correlations between next- 
nearest neighbor volumes. While the equation system 
defined by Eq. (A 6 ) can be solved for the whole set of 


covariances Cij upon imposing suitable boundary condi¬ 
tions, this can be numerically expensive for larger spatial 
lattices and large parameter sweeps as part of optimiza¬ 
tion runs. In this work, the quantity of interest is the 
variance Cu in volume i, such that the calculation of 
longer-ranged correlations is not strictly required. 

Considering the cases i = j and v £ N(i) (v being one 
of the nearest neighbors of i) in Eq. (A 6 ) separately re¬ 
veals the following interdependence between the variance 
of = Cu and the nearest-neighbor covariance C^: 


(7 


f = A 2 


A 2 


E 

i'EiV(z) 


l £ n> 

k 


(A7) 


C iv = — Y C inu + Y C " 


^ni/GAT(z/) rii£N(i ) 

T 


+ -^(T ife r vk ) (A 8 ) 


Next-nearest correlations now only appear in Eq. ( |A 8 | ), 
which can be significantly simplified by an approximation 
that we call the “short-correlations assumption” (SCA): 
If the next-nearest-neighbor covariances are assumed to 
be small compared to the nearest-neighbor covariances 
and single-point variances, we can ignore them and set 


Cin u — 0 Vrit/ 7 ^ i 
Cum — 0 ^ v 


(A9) 
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because then the only neighbor of v that has some signif¬ 
icant correlation with i is * itself, and vice versa. In that 


case, the only remaining terms of the sums in Eq. (A8) 

2 respectively: 


are Cu = of and C vv = a, 


C b 


= — (a 2 
2 ^ i: 


cy (rifcTyfc) 


(A10) 


Specifying further the noise powers J^ L . (T 2 ^) in (A7) and 

v,(r — 


( |A10[) as described in Section 
formulas (|10|) and (111 


TlB 


yields 


Appendix B: Simplified variance formulae 


In our framework, computation of the mutual infor¬ 
mation I(x;g) only requires the means <ji and variances 
of. This enables us to reduce the number of coupled 
equations to be solved by direct insertion of (11) into 
(10), thus eliminating CV,-. For the ID model with 


short-correlations assumption (SCA), after some alge¬ 
braic steps this yields 


A 2 


1 


(1 + 2A)' 2 -A 2 2 


y (°f-i + °?-i) 


+ 


1 + 2A 


Am ax ] (1 + 2A) 2 - A 2 
( f(ci) + 9z 


+ 


df 

dc 


cc o 


(1 + 2A) + A 2 


O (ffi-1 + + ffi+l) / (Bl) 


where we have rewritten prefactor combinations contain¬ 
ing T and A 2 in terms of the spatial coupling A. 

The analogous formula for the 2D model with SCA 
reads 


2 _ 
°W) 


A 2 


- V 

o 


(1 + 4A) 2 — 2A 2 2 

1 + 4A 


A max | (l + 4A) 2 - 2A 2 
/(cfe')) +9(ij) 


df_ 

dc 


+ 


A (1 + 3A) 1 

(1 + 4A) 2 — 2A 2 2 




CC 0 


HW 


% (ij) 


2 (ij) / 


(B2) 


where we indicate volumes of the two-dimensional lattice 
with a 2D index (ij), and the n^-sums run over the 
four nearest neighbor volumes of ( ij ). 


Appendix C: Full solution without short-correlations 
assumption 


For completeness, here we also state the formula defin¬ 
ing the linear system for the coupled covariances in the 
full two-dimensional model without short-correlations 
assumption, again indicating volumes by the two- 
dimensional index (ij): 


C(ij)(ki) — 2 


1 


1 

+ 4A 
+ A 


•^(ij)(kl) 


+ C(ij)(k,i+ 1) 


+ C(i-l,j)(kl) + W(»_|_1 J)(kl) 

+ C(i,j-l)(fc/) + C(i,j+l)(fc0 

(Cl) 

where the normalized noise term is only non-zero 

if the volumes indicated by (ij) and ( kl ) are identical, i.e. 
(ij) = (kl), or nearest neighbors, i.e. (kl) £ N((ij)), and 
takes one of the following forms, respectively: 




= G 


(ij) 




H*j) 


V [(ij)^n m ] + V (n iij) ^(ij)]J 

if (ij) = (kl) 


a; 




= -(V 


m^(ki)] 




[(kl)-t(ij)] 


if (kl) £ N((ij)) 


Nfij)(kl) = 0 e ^ Se (C2) 

Here n^j) runs over the four nearest-neighbors of volume 

(ij). The normalized noise powers Gf^) and 

derive from the expressions presented in Section |II C| 


^(ij) ]y 


V, 



C (*j) / 


[(ij)^-(kl)] - M 9(ij) 

1 v m a.x 


(C3) 


Eq. (Cl) in principle allows for calculation of covari¬ 


ances between volumes that are arbitrarily far apart. In 
practice, because of the finiteness of the lattice, the spa¬ 
tial correlations still have to be truncated at a certain 
distance and boundary conditions applied in order to ob¬ 
tain a closed system that consists of as many equations 
as variables. 
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Quantity 

Symbol 

Value 

Lattice spacing 

5 

8.33 gm 

No. of nuclei in x-direction 

N x 

60 

- resulting system length 

L 

500 gm 

Internal activator diffusion constant 

D c 

3.16 gm 2 /s 

Activator binding site length 

lc 

0.01 gm 

Product protein lifetime 

T 

240 s 

Std. max. mean product copy no. 

N° 

A 1 max 

444 

- resulting typ. concentration 

Co 

58.5 gm~ 3 
(~ 35 nM) 

- resulting typ. diffusion constant 

Do 

0.29 gm 2 /s 

Std. input gradient length 

7o 

100 gm 

Std. input gradient amplitude 

Cmax 

= co 


TABLE I. The standard parameter values of our model. 


Appendix D: Standard parameters 


In our derivations most system parameters can be com¬ 
bined into two natural scales, a typical concentration Cq 
(cf. Section IIC) and a typical diffusion constant D 0 (cf. 


Section IIBI, and we measure concentrations and diffu¬ 


sion in these scales throughout our theory. To obtain 
numerical solutions, however, concrete numbers have to 
be assigned to these quantities. 

Since our example application represents the bicoid- 
hunchback system in early Drosophila development, we 
opted for the following choice for the baseline values of 
the parameters in Table [I] First we chose lattice param¬ 
eters that roughly correspond to the geometry of the 
Drosophila syncytium 133 ESI; in particular, this defines 
the lattice spacing 5. We then chose a typical value for 
the product protein lifetime r, which defines the typical 
diffusion scale Dq = 5 2 /r. 

To set the value for Co, we took advantage of the fact 
that in the bicoid-hunchback system the activator (i.e. 
Bicoid) concentration at nridembryo has been measured 
experimentally [Ml E3- This lead us to set Co equal 
to the extrapolated maximal value of the measured in- 
vivo gradient at its source (x = 0). In other words, at 
C = c max /co = 1 the input function c(x) in our model 
corresponds to the experimentally measured Bicoid gra¬ 
dient. Finally, to define a standard value for the max¬ 
imal mean output iV max , which is a key determinant of 
the n oise p owe rs and variances (cf. Eqs. (18), (19), (Bl), 
(B2| and (C3)) but to-date unknown, we chose typical 
values for the internal activator diffusion constant D c 
and the binding site length Z c ; from this, we calculated 
the standard value N^ ax = cqD c I c t. When varying C 
away from the baseline setting, we changed either c max 
while holding _/V max (and thus Co) constant, or by vary¬ 
ing -/V max and keeping c max unchanged, as described in 
Appendix [E| 

Table[j]demonstrates that all baseline parameter values 
are well within a biologically realistic regime. 


FIG. SI. Information capacity for varying noise type 
ratios, varied via IV max . Shown is the optimized informa¬ 
tion capacity as a function of the noise type ratio C = c ma x/co 
with optimal diffusive coupling ( I*(C ), solid blue line) and 
without diffusive coupling ( Io(C ), dashed blue line). C was 
varied via JV max . At small C values, non-Poissonian input 
noise is dominant, while Poissonian output noise dominates 
for large C. The blue-shaded area depicts the maximal gain 
in information capacity from diffusive coupling. 


Appendix E: Varying C via JVmax 


In our model, the noise type ratio C = c max /co = 
Cmax/^Vmax • ( D c I c t) can be varied in multiple ways. In 
addition to altering the importance of non-Poissonian in¬ 
put noise via c max , we also studied the case in which 
the contribution of Poissonian output noise is varied via 
7V max while c max is held constant. 

The main difference to the case in which c max is scaled 
is that now information capacities decrease with increas¬ 
ing C, as this means decreasing iV max and thus enhancing 
output noise. The highest information capacity then is 
attained in the limit C —> 0, i.e. IV max —> oo. In the 
uncoupled system, I(x;g) saturates for C —> 0 towards 
a value dictated by the amount of (in this case) irre¬ 
ducible input noise, set by c max . With spatial coupling, 
optimal I(x;g ) values in the low-C regime are markedly 
higher, in accordance with the finding that spatial aver¬ 
aging can only enhance information capacity when input 
noise is dominant. As expected, for C —>■ 0, i.e. negligi¬ 
ble output noise, and sufficiently strong spatial coupling, 
i.e. strongly attenuated input noise, the capacity ap¬ 
proaches the hard bound of the noise-free limit given by 
the finite number of sampling points N x along the x-axis, 
I™* = log 2 (N x ). 
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Appendix F: Double maximum most clearly in the 2D model (with SCA), but also ap¬ 

pears in the ID model at slightly different values of C 
and A. 


In Fig. [S2] we show three different information planes 
for noise-type ratio C = 0.01 and increasing values of the 
spatial coupling A (A = 1, A = 10 and A = 100). The 
heat maps demonstrate the emergence of distinct optimal 
regulatory strategies for A > 1: for sufficiently strong dif¬ 
fusion (A = 10), very steep activation curves (H 1) at 
lower activation thresholds K result in similar informa¬ 
tion transmission as less steep activation curves (H ~ 1) 
at slightly higher K. For A = 100, steep activation per¬ 
forms better than the other strategy. This effect is seen 


Appendix G: Model comparison 

In Fig. |S3| we demonstrate how including increasing 
detail affects our results for a paradigmatic case (C = 1, 
A = 1). The different panels show the same informa¬ 
tion plane for the ID model with SCA, the 2D model 
with SCA and the 2D model without SCA, which re¬ 
tains longer-ranged correlations with next-nearest neigh¬ 
bor volumes. 
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FIG. S2. Emergence of a second maximum in the information plane. Here we plot for C = 0.01 and standard input 
gradient length, i.e. A = 1, the the positional information I{x\g) as a function of the regulatory parameters H and K for 
increasing strength of spatial coupling: (A) A = 1, (B) A = 10, and (C) A = 100. White stars mark local maxima, white 
boxes show the corresponding amount of information in bits. The data shown is for the 2D model with SCA. 



FIG. S3. Comparison of I(x;g) for increasingly detailed versions of the spatial-stochastic model at (7 = 1. The 

reference plot for the ID model with SCA (left) is identical to the information plane for C = 1 in Fig. [3] The same information 
plane is shown for the 2D model with SCA (middle), and for the full 2D model that retains next-nearest neighbor correlations 
(right). 






























































